Edit Machines for Robust Multimodal Language Processing

نویسندگان

  • Srinivas Bangalore
  • Michael Johnston
چکیده

Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, handcrafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. Spoken language (speech-only) understanding systems have addressed this issue of lack of robustness of hand-crafted grammars by exploiting classification techniques to extract fillers of a frame representation. In this paper, we illustrate the limitations of such classification approaches for multimodal integration and understanding and present an approach based on edit machines that combine the expressiveness of multimodal grammars with the robustness of stochastic language models of speech recognition. We also present an approach where the edit operations are trained from data using a noisy channel model paradigm. We evaluate and compare the performance of the hand-crafted and learned edit machines in the context of a multimodal conversational system (MATCH).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Articles: Robust Understanding in Multimodal Interfaces

Multimodal grammars provide an effective mechanism for quickly creating integration and understanding capabilities for interactive systems supporting simultaneous use of multiple input modalities. However, like other approaches based on hand-crafted grammars, multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent input. In this article, we show how the finite-sta...

متن کامل

Probabilistic Finite State Machines for Regression-based MT Evaluation

Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping a...

متن کامل

A Statistical Approach to Multimodal Natural Language Interaction

The Human-Centric Word Processor is a research prototype that allows users to create, edit and manage documents. Users can use real-time continuous speech recognition to dictate the contents of a document. Speech recognition is coupled with pen or mouse based input to facilitate all aspects of the command and control of the application. The system is multimodal, allowing the user to point and s...

متن کامل

Multimodal signal processing in naturalistic noisy environments

When a system must process spoken language in natural environments that involve different types and levels of noise, the problem of supporting robust recognition is a very difficult one. In the present studies, over 2,600 multimodal utterances were collected during both mobile and stationary use of a multimodal pen/voice system. The results confirmed that multimodal signal processing supports s...

متن کامل

OpenMM: An Open-Source Multimodal Feature Extraction Tool

The primary use of speech is in face-to-face interactions and situational context and human behavior therefore intrinsically shape and affect communication. In order to usefully model situational awareness, machines must have access to the same streams of information humans have access to. In other words, we need to provide machines with features that represent each communicative modality: face...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006